Search CORE

6 research outputs found

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Cluster analysis is an important task in data mining. It deals with the problem of organization of a collection of objects into clusters based on a similarity measure. Various distance functions can be used to define the similarity measure. Cluster analysis problems with the similarity measure defined by the squared Euclidean distance, which is also known as the minimum sum-of-squares clustering, has been studied extensively over the last five decades. L1 and L1 norms have attracted less attention. In this chapter, we consider a nonsmooth nonconvex optimization formulation of the cluster analysis problems. This formulation allows one to easily apply similarity measures defined using different distance functions. Moreover, an efficient incremental algorithm can be designed based on this formulation to solve the clustering problems. We develop incremental algorithms for solving clustering problems where the similarity measure is defined using the L1; L2 and L1 norms. We also consider different algorithms for solving nonsmooth nonconvex optimization problems in cluster analysis. The proposed algorithms are tested using several real world data sets and compared with other similar algorithms

Crossref

Federation ResearchOnline

Fuzzy C-Means (FCM) Clustering Algorithm: A Decade Review from 2000 to 2014

Author: A Mukhopadhyay
A Staiano
A Webb
AT Azara
C Qiu
C Xu
CG Looney
CJ Zhu
D Dovžan
D Horta
DC Park
DM Vargas
DTC Lai
E Alpaydin
EH Ruspini
F Guoyao
F Zhao
GE Tsekouras
H Fritz
IA Maraziotis
IB Aydilek
J Fan
J Lazaro
J Nayak
JC Bezdek
JC Dunn
JH Pei
JL Fan
JM ŁeRski
JP Mei
JZC Lai
K Li
KL Wu
KR Sudha
KS Chuang
LM Wei
LS Iliadis
M Ceccarelli
M Hassana
M Huang
M Kuhne
M Ozer
MA Balafar
MA Sancheza
MH Asyali
MN Ahmed
MRP Ferreiraa
MS Kamel
MS Sua
MS Yang
MS Yang
MS Yang
N Belacel
N Bharill
OS Pianykh
P He
PL Lin
PN Tan
Q Liu
RE Bellman
RL Cannon
S Icer
S Mitra
S Miyamoto
S Shamshirband
S Silva
SA Mingoti
SC Chen
SM Chen
SR Kannan
SR Kannan
T Geweniger
V Ravi
W Cai
W Halberstadt
W Pedrycz
W Pedrycz
WD Kim
WD Kim
WL Hung
WX Xie
X Kong
X Li
X Wang
XC Yu
XF Yin
Y Dong
Y He
Y Yan
Y Özbay
Z Ji
Z Ji
Z Xue
Z Zeng
ZH Inan
ZS Xu
ZX Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref